-
-
Notifications
You must be signed in to change notification settings - Fork 763
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bring OCR to Mealie for importing scanned recipes #1244
Conversation
FYI Rebasing should get CI sorted. Need the changes from #1252 |
5863804
to
986c068
Compare
Thanks I missed this line in the feedback. Yes sounds like the type check is pretty angry at me for being sloppy. I'll get it to work. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are just my cursory review comments. I haven't done a thorough review and there will likely be more changes required to get this merged in.
Before I dig too deep into this I would like to see a more thorough write-up of the feature overall, how it works.
Some critical areas I see that need more documentation are frontend/pages/recipe/_slug/ocr-editor.vue
and some context around the tests that you've written like what is being tested and how we can validate the skipped test if the output is some-what unreliable.
Looks really cool so far, thanks for your work on this one!
Added a design section to explain all the relevant work that make this possible hopefully the block of text is not too hard to read. |
8e82e38
to
a1bdaf4
Compare
Rebased to fix merge conflicts with mealie-next branch |
I have tried using the feature to add a bunch of recipes to see how it does. And... It's outputting gibberish when jpg files include a lot of text which is the worse user experience imaginable. There is either a lot a pre-processing to do that would make it much slower than it is currently. A little bit disappointed with tesseract, I'm going to invesitgate further as to why it is behaving this way, Putting the PR temporarily in draft again. |
I can't keep rebasing this amount of changes everytime so I have marked the PR up for review. Feedback would be greatly appreciated so we can merge this as soon as possible. |
Any updates on this comment? |
For now, I have restricted the image format to png, I have found no help on Tesseract's side though I did not look too far. There is also the option of converting any input image to png, then using the png to do the ocr. This adds a huge overhead that would mean I could not afford to ask the server every time to recognize characters in the image when the ocr editor is loaded like it is now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to go through and clean up what was left to get it merged, but there was too much going on in the Vue component for me to dig though and fix everything. Maybe when I have some more time I can go through it and clean up it, but I left some comments on the issues that I'm seeing
The biggest problem with the component is that there is just so much going on and it's difficult to group what belongs together. What I was going to was break the functions and state that go together into separate composable and place those in a different file to allow for logical group of items - maybe related but there were so many typescript errors in VSCode from Volar that it made reviewing extremely difficult, not sure what the issue was there.
btw, you've got a __init.py__
file in the mealie/services/ocr that needs to be fixed.
Like I said, I tried to get it to a good point, but it was more of a weekend project. Will hopefully take another crack at it this weekend if you don't get to it first.
I can do the clean up if you think it should be done before it is merged. One of the reasons I did not do it yet is that actually, most of the script mess are event handlers for the canvas, which means that if I move the canvas to its own component for example, most of the current functions will follow hence just moving the mess. Most of it is math or helper functions to prevent code duplication. I'll try to make it look as best as I can make it look and you let me know what you think. Least I can do is give it a try, it will be easier for me to clean up since I wrote all those things.
A more general note: Volar is complaining only in the template about the recipe returned by useRecipe because the |
Merged as apart of #1670 with a few minor things cleaned up. Thanks for sticking with this one until the end. 🎉 |
Added so far
As a first draft, the recipe is created with the recognized text in the description field to copy and paste later in edit mode.no more2022-07-28 Update: I'm at a point where I think the component is very usable and open for reviews to merge something that is very close to what is already implemented.
Before merging
Here the list of tasks before we consider acceptable to merge this code in the beta
Add mode to highlight boxes of recognized text on the imageRecognized text is highlighted on component mountNice to have's
Design
This new feature is based on previous experience with a similar software solution called Esker.
The process that I have designed for now lets the user use a new creation page
/recipe/create/ocr
letting them upload a picture, optionally making it the recipe thumbnail. This creates a recipe called "New OCR Recipe" with the uploaded picture as an asset called "Original recipe image". Additionally, a new column in the recipes table registers that this recipe is an OCR recipe.The user is directed to the page "recipe/_slug/ocr-editor" where they can use the image they uploaded to fill the usual recipe fields on the right part of the page. When this page in mounted, it sends the asset name to the backend fot it to send back the text and contained inside and its position.
Two modes are available.
In selection mode, the user can draw a rectangle, the identified text will appear under the canvas. The user can then select any recipe field on the right, then click anywhere inside the rectangle. This will take whatever text is fully contained in the rectangle and overwrite the field that was last selected.
The bulk add buttons will spawn a dialog with the selected text (understand text under the drawn rectangle) inside them.
This is where the Split text modes come into play, it lets the user choose whether they want to keep all line breaks, for example, if a recipe book lists one ingredient per line, they are able to select the whole list, press bulk add on the ingredient tab and add all ingredients in 2 clicks.
The mode flatten will remove all line breaks and the blocks mode will put line breaks between identified blocks by tesseract. The blocks mode is pretty useful for instructions, that usually come into multiple paragraphs in a form of blocks, making it easier to use the bulk add dialog, this time for instructions.
For recipes that are called
New OCR Recipe (n)
or regex/New\sOCR\sRecipe(\s\([0-9]+\))?/g
, the ocr-editor component will take the biggest block with the fewer words, assume it is the recipe's title, and populate it in the recipe name field. This is done with the functionfindRecipeTitle
in the ocr-editor component.When the user is happy with the edits the recipe can be saved the usual way.They can come back to the OCR editor page by clicking the usual edit button and using the new button "OCR Editor" that will appear when the recipe is an OCR Recipe (hence the new table column).